OSINT gathering can be a daunting task, but if you have a framework, create a plan and harness the right techniques, you can find the information you need fast.

OSINT refers to the practice of collecting information from publicly available sources to support intelligence needs. The OSINT finding is considered raw data and needs to be interpreted and analyzed to become actionable intelligence ready to exploit.  

OSINT is used in various scenarios by various user groups, and it has become an increasingly important field for the following reasons:

  • Increase online data volume: The large volume of data available online, such as exists on social media platforms and public records, has provided a rich and endless source of OSINT
  • Support the decision-making process: OSINT allows decision-makers to make informed decisions based on data collected from public sources. For instance, a U.S. company wanting to invest in an Asian country can leverage OSINT to check local competitors' capabilities, view industry trends, perform due diligence on partners and investigate the general investment atmosphere in that country before investing in it.
  • Manage risk: Leveraging OSINT techniques and tools can reveal potential threats and vulnerabilities in an organization's IT environment and fix them before they get exploited by threat actors. The same strategy applies to its business partners and vendors; OSINT can reveal important information about third-party vendors to avoid their weaknesses being exploited in an attack on the organization's IT environment  
  • Verify facts: OSINT can be used to verify facts from different sources to identify fake and misleading information

OSINT gathering framework

An OSINT gathering framework refers to the set of procedures and best practices used by OSINT gatherers to collect and disseminate information from publicly available sources. A typical OSINT framework consists of the following main phases:

  • Identify intelligence requirements: Identify the entities (individuals or companies) you are going to gather information about and specify what data needs to be collected about them.
  • Collect data: Gather data about target entities using a plethora of publicly available information, which includes government databases, social media platforms and information available on websites — such as articles and press release.
  • Use tools: Use different tools to gather and analyze data, such as web scrapers for collecting data and EXIF tools to investigate the collected digital file metadata (such as image and video metadata). Tools help streamline and automate the process of collecting and analyzing large sets of data.
  • Evaluate collected data: It is critical to ensure its authenticity, accuracy and relevancy to the intelligence issue. This can be achieved using different methods, such as verifying the information using two quality sources.
  • Process collected information: We need to structure data from various sources to identify connections between different entities. We can use software in this phase to discover hidden connections and gain more insight into the collected data. This gives us a holistic and comprehensive view of the investigated subject.  
  • Reporting: We need to present our findings in a formal report that lists all key facts and delivers the finished intelligence to the decision-makers in an understood way so they can act upon it.

During all phases of the OSINT gathering process, it is vital to remain in compliance with the implemented privacy regulations and avoid breaching any law or exposing the private information of individuals in any way.

Creating a plan for OSINT gathering

Like the OSINT framework, the OSINT search plan will ensure you follow a systematic methodology to collect and analyze information. However, the OSINT gathering plan is more detailed in terms of identifying the technical, operational and analytical disciplines involved in each phase of the OSINT gathering process. Here is a proposed OSINT search plan:

Identify intelligence requirements

The first step concerns identifying the entities we need to inspect during our research, whether they are individuals or organizations. We also need to determine the exact type of information we need to collect about them, such as background information about individuals, the network topology and the cyber assets if the target is an organization.

Create a research plan

Now, we need to identify the main tasks that our research is composed of. It is better to divide these tasks into smaller, more manageable mini-tasks and prioritize them according to their importance.

Gather basic information

Next, we need to begin collecting basic information about our subject. For example, if we are researching a company, reading information on its website is a good start.

We can use social media platforms and general-purpose search engines to gather basic information about our target that appears in search engine results when searching for a target person or organization name.

We should also consider searching for digital files posted online about the target entity. For example, when gathering information about a company, we should consider searching for all MS Office files, PDF documents and other media files (images and videos) posted online and related to this company. Inspecting their metadata can reveal important information about their IT infrastructure (type of hardware and software used), employees and their associated contact information (email, phone numbers).

Social media intelligence

Social media platforms such as Facebook, Twitter and Instagram become extremely widespread. For instance, seeing one internet user without at least one social media account is rare. On the other hand, organizations use social media platforms to promote their services and interact with customers — analyzing target social media profiles helps us identify their relationships and connections with other people and entities.

Search deep and darknet

Some information could be hidden in deep and darknet. The deep web contains data that typical search engines cannot index. Examples include government databases (court records, birth/death certificates), commercial databases (business registries, property records) and websites requiring registration access. These deep web sources should be thoroughly searched for relevant intelligence.

Dark web networks such as the TOR network may contain valuable information. While leaked information on dark nets about a target's business connections or online accounts may provide helpful intelligence, extreme caution should be taken before using them in your investigation. It is advisable to vet any dark web sources to validate authenticity, source legitimacy and adherence to privacy laws in your jurisdiction before utilizing such data as OSINT.

Use different tools and techniques

The volume of digital data is increasing at an explosive rate. At the same time, this is considered advantageous for OSINT gatherers because they will get more information about their targets. Still, it also means they will get bombarded with massive data that needs analysis and verification. Using automated tools across all OSINT gathering phases will aid investigators in speeding up the search process and help them quickly decide which information is relevant to the intelligence issue.

Verify and cross-reference collected information

After finishing collecting data from different sources, it's time to cross-reference them and ensure their accuracy and relevancy to the intelligence case. Misleading and fake data is prevalent across the web, and we need to discard intentionally misleading data.

Verifying collected data will not only help identify misleading data but also help establish confidence in the collected data and spot gaps between different data sources, which ultimately leads to revealing unknown areas that should be investigated further.

Ethical considerations

Throughout the OISNT gathering phases, researchers should be extremely cautious about not breaching any copyright or privacy law. For example, leaked data, such as personally identifiable information (PII), financial information and medical records about patients, in addition to leaked credentials (usernames and passwords), should not be included in the OSINT final report unless there is an actual need and the ethical and legal standards have been followed strictly.

Report your finding

The final phase involves listing key findings in a formal report. The report may include visualizations such as graphs, web resource screen captures and links to other documents.

OSINT techniques

There are a plethora of tools and online services for collecting and analyzing publicly available data. Here are the main ones:

Search engines

Search engines such as Google, Yahoo! and Bing are our window to the surface web. Advanced search operators, such as Google Dorks, can be utilized to find more accurate information.

Social media analysis

As we stated before, social media platforms can reveal significant information about any entity. There are automated tools for collecting data from social media platforms. We can also use the built-in search feature of each particular platform to refine our search results.

Website analysis

Most OSINT gathering activities may involve investigating the ownership of a website. There are different tools and online services for doing this, such as:

Files metadata

Digital files may contain important information about the file author in addition to other technical information related to the file itself — such as the create/update/access time, software program version, type of operating system, geolocation data (such as GPS coordinations) in addition to the capturing device type — if the file is an image.

Email header analysis

If the OSINT case involves emails, we should investigate their header, which can reveal the sender's IP address and other technical information about the mail sender.

Web scraping

A web scraper is an automated tool used to extract content from websites. There are numerous web scrapers that OSINT analysts can use, such as:

Before scraping a website, you should be careful not to breach the website usage agreement, as many websites do not allow scraping their contents.

Government databases

Public databases can prove helpful in OSINT gathering. Here are some examples of such databases:

  • Property records: such as real estate ownership and property assessment  records
  • Corporate records: such as business registries
  • Court documents: such as criminal records, pending lawsuits and court dockets
  • Licenses: permits given to businesses to operate in specific locations or fields
  • Vital records: such as marriage, divorce, birth and death records
  • Public directory: this includes public listing of phone numbers, email and mailing addresses

Media monitoring

Searching news articles, blogs, and people's comments on news and media websites can reveal important information about the investigated entity

Dark web search

Dark web networks such as TOR can contain information relevant to our investigation, such as leaked data and any mention related to our investigated entity.

Other technologies that can be used during OSINT investigation include satellite images of locations or activities conducted at a specific time point.

OSINT challenges

Despite the numerous benefits of OSINT data, many disadvantages can make the process of OSINT data gathering very challenging; here are the main points:

  • Sheer volume of data: The volume of digital data is increasing at an extraordinary rate. For instance, by 2025, the overall amount of data generated worldwide will reach 158 zettabytes. Acquiring intelligence from the massive amount of online data would be very challenging.
  • Disinformation: The amount of misleading information is increasing rapidly online. This requires OSINT gatherers to verify their sources using more than one source, increasing the overall investigation time.
  • Compliance risks: In some countries, gathering information about people could be risky (the GDPR limits collecting personal information about EU citizens). For example, even though the information available on social media platforms is considered public information, exploiting it in intelligence scenarios could be risky and result in lawsuits in some jurisdictions.
  • Data complexity: Combining data collected from a wide range of sources into one finished intelligence product using an organized timeline is very challenging, especially with the increased amount of online data and disinformation.   

Gathering OSINT safely

Maintaining researcher anonymity is extremely important when gathering OSINT data. Revealing the gatherer's identity can have serious consequences, compromising both their safety and the intelligence operation itself.

For example, if a researcher is investigating a drug dealer through OSINT techniques and the target discovers they are being tracked online, they may conceal their activities or even send someone after the researcher.

The best practice is to utilize browser isolation solutions like Authentic8 (Silo for Research) that separate browsing activity from the end user device during OSINT searches. This prevents the possibility of browser-based data leaks tying activity directly to the analyst. Investigating through an isolated, anonymous context is essential for implementing OSINT securely while avoiding unnecessary exposure risks to the researcher.

Tags
Anonymous research OSINT research Social media